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1.  Introduction. 


Let  v  denote  an  experimental  unit  drawn  randomly  from  a  popula¬ 
tion  n.  The  claaaif lcatlon  problem  in  Its  standard  form  la  to  devlae 
rulea  so  as  to  Identify  *  with  one  of  the  two  given  "distinct"  popula¬ 
tions  and  Wj •  A  set  of  p  real-valued  measurements  X:  p  x  1 

is  observed  on  w  and  It  la  believed  that  the  dlatrlbutlona  of  X  In 
those  two  populations  are  different.  In  this  paper  we  shall  assume 

that  X  .  N  (li.I). 

-  P  - 

Let  denote  the  mean  of  X  In  the  population  (1  ■  1,2), 

where  The  classification  problem  Is  to  find  "good"  rules  for 

deciding  whether  U  •  u ^  or  U  ■  U,.  When  all  the  parameters  v^, 

and  I  are  known  Wald's  decision  theory  [^]  may  be  used  to  derive 

the  almlnal  complete  class  of  decision  rules  for  zero-one  loss  function. 

It  Is  given  by  the  following,  except  for  sets  of  measure  zero  [>]: 
k 

The  rule  $  decides  U  ■  Iff 

(1.1)  (x  -UL)  *  Z"1  (x  -  uL)  -  (x  -  U2)  *  IT1  (x  •  y  <  k 

It  can  be  proved  [2]  that  the  rule  is  the  only  admissible 
alnlmax  rule. 

However,  In  practice  all  the  parameters  are  not  known,  and  In 
order  to  differentiate  the  two  populations  random  (training)  samples 
from  both  the  populations  a.e  obtained.  It  may  be  remarked  that  If 
either  of  and  u,  Is  known  It  Is  not  necessary  to  draw  samples 

from  both  the  populations. 
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;.ocIal 


Let  e  stand  for  (L,  L^,  L0.  7) ,  and 


(1.2) 

6  1  -  {6: 

U  - 
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(ur  V 
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€  ft 
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(1-3) 
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Z) 

€  ft 

)  . 

where 

ft  is  a  known 

set 

in 

the  space 

of 

U-  and  Z.  It  may  be 

noted  that  in  order  to  control  (arbitrarily)  both  probabilities  of 
incorrect  classification  certain  conditions  must  be  Imposed  on  0 
and  sequential  sampling  schemes  may  have  to  be  used  [5],  However,  in 
standard  practice  (1  is  taken  to  be  the  set 

(1.4)  ft  ■  {(Uj.  Lj,  I):  Lj,  Uj.  ^  RP,  Uj  *  Lj.  7  is  positive- 

definite)  . 


Following  Fisher  [7]  a  set  of  heuristic  rules  (called  plug-in 
rules)  may  be  devised  by  first  choosing  some  good  estimates  of  the 
unknown  parameters  and  replacing  the  unknown  parameters  in  C  by 

k 

their  respective  estimates.  We  shall  call  such  a  rule  «  when 

P 

the  standard  estimates  are  used. 

Let  Xn . X^  denote  the  X-oberservations  of  the  training 

sample  from  (i  -  1,2).  Define  (assume  -  2  >  0) 


1 

(1.5)  X  -  I  X  /r  (i  -  1,2)  , 

1  1  ■  1  1 
"l  n2 

-  (  I  (X  -  X  )  (X  -  X.)'+  I  (X 

J  -  1  1  J  -  1 


2j 


V  (Xtj  -  X:)'  )/{nl+n2-2) 
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S 


When  all  Che  parameters  are  unknown,  Fisher's  plug-tn  rules  are 
given  by  the  following:  The  rule  $  decides  u  ■  Iff 

(1.6)  (X  -  XL)'  S*1  (X  -  *x)  -  (X  -  X2)'  S"1  (X  -  X,)  <  k 

Using  che  likellhood-ratlo  principle  Anderson  [1]  proposed 


che  following 

rules  when  (U^,  U2>  Z)  lies  In 

ft  given  by  (1 .4) : 

The  rule 

< 

decides  li  ■  fff 

(1.7) 

(1  + 

1/n^-1  (X  -  5^)'  S*1  (X  -  j^) 

-  X  (1  +  l/n,)"1 

(x  -  x:r  S*1  (X  -  x2>  X  -  1, 

where  S*  ■  mS,  m  ■  n^  +  tij  -  2.  (X>0)  Vote  that  ¥  *  ■  when 

■  n2 .  The  llkellhood-ratlo  rules  turn  out  to  be  the  following 

It 

when  -  is  known:  The  rule  decides  U  ■  U  Iff 

1 

(1.8)  (1  ♦  1/n^-1  (X  -  X2)'  Z~l  (X  - 

-  (1  +  1/n,)"1  (X  -  X2)'  Z~L  (X  -  X2)  <  k. 

One  may  also  derive  some  "good"  constructive  rules  from  various 
optimality  criteria.  In  this  paper  we  shall  obtain  some  good  rules 
frott  Wald  s  decision-theoretic  viewpoint,  and  also  from  asymmetrical 

Neyman- Pear son  approach.  We  shall  also  study  the  above  two  classes 
of  heuristic  rules  from  some  optimality  criteria. 

2 .  The  Univariate  Case 

2 

2-1.  p  ■  1,  3  1»  known.  Without  any  loss  of  generality  we  shall  assume 

2 

that  <3*1.  Let  ip  ■  (vp^ ,  tp^)  stand  for  a  decision  rule,  where 


3 


is  the  probability  of  deciding  L  ■  given  the  observations.  We  shall 
consider  only  the  rules  based  on  sufficient  statistics  X,  and  X^. 

First  we  shall  make  an  orthogonal  transformation  as  follows:  Define 

(2.1)  Uj  -  k1l(l+l/n1)"‘1(X-Xl)  +  (l+l/n2)"‘J(X-X2)], 

(2.2)  U2  -  k:((l+l/n1)"‘5(X-X1)  -  (l+l/n2)"lj(X-X2)], 

(2.3)  U3  -  k3(X  n^ Xj  +  n^], 

where  k^'s  are  chosen  so  that  vard’^)  ■  1:  1  •  1,2,3.  Note  that  's  are 
Independently  distributed.  Let  E (U^ )  •  .  Then  ~  N(v^,l). 

In  terms  of  (v^,  Vj ,  Vj)  the  sets  0^  and  ©2  as  defined  in  (1. 2) -(1. 4) , 
are  transformed  as  follows: 

(2.4)  ni  •  {(v^ ,V2»V3) :  vi  "  v*  v2  "  "CV’  v  ^  v3  € 

(2.5)  fi2  -  {(Vj.Vj.Vj):  v3  -  v,  v  •  cv,  v  i  0,  v3  €  R)  , 

where  c  •  k^/k^  >  0.  (k^'s  ar*  ch°8*n  to  b*  positive.)  Note  that  c  >  1. 

2.1.1  Bayes  Rules  and  Mlnlnax  Rules 

It  is  easy  to  see  that  by  taking  a  suitable  prior  distribution  of  v3 
Independently  of  and  V.,  we  can  gat  Bayes  rules  free  from  Uj.  Hence 

we  shall  only  consider  prior  distributions  of  (v^.v,,)  and  drop  t’3  from 
the  argument  of  <p.  Let 

(2.6)  fi*  -  {(v^Vj):  v1  •  v,  v2  •  (-l)1cv,  v  i  0)  . 

Consider  a  prior  distribution  4(6,Y,Vq)  which  assigns  probabilities 
6>,  (1-6) (1-Y),  8(1-y).  y(l-6)  to  the  parameter  points  (vQ,  cv^) ,  (-Vq.-cVq), 
(vQ,  -cvQ) ,  (-vQ,  cvQ),  respectively,  where  0  <8  <  1,  0  M  <  1,  vQ  >0. 

It  can  be  seen  that  the  unique  fc.ej  Bayes  rule  (for  zero-one  loss  function) 
against  the  above  prior  distribution  is  given  by  the  following:  Decide 
(Vj,  v2,  v3)  €  Qj  iff 
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(2.7) 


(U^Ml^-Cj)  <  0, 

where  Cj^  and  c,  are  functions  of  v  8,  Y  and  c.  Conversely,  given 
and  it  is  possible  to  choose  8,Y  and  appropriately.  Another 

class  of  Bayes  rules  may  be  obtained  from  the  following  prior  distributions: 

* 

The  probability  that  (Vj/V2^  ^  is  ^i*  and  *iv*n  that  vj_  "  v» 

i  2 

^2  “  (”D  CVJ  the  distribution  of  v  is  N(0,  x  ).  The  unique  (a.e)  Bayes 

rule  against  the  above  prior  distribution  decides  (v^.v^.v^)  €  iff 

(2.8)  <  k, 

where  k  is  a  function  of  ,  £.  and  c.  Different  types  of  Baves  rules 
are  given  by  DasCupta  and  Bhattacharya  [3], 

Now  consider  the  rule  which  decides  (v^^.v^)  €  iff 

(2.9)  U1U2<0. 

Note  that  (2.9)  is  equivalent  to 

(2.10)  (l+l/n1)‘1lX-X1)2  <  (l+l/n2)_1(X-X2)2. 

Thus  the  above  rule  is  the  same  as  q>^,  defined  in  (1.8).  The  rule 
is  the  unique  Bayes  rule  against  the  prior  ^(S.^.Vg)  for  any  >  0. 
Moreover,  the  risk  of  the  rule  ^  is  constant  over  the  four-point  set 

<VCV  •  (“vg.-<:vg)  •  (_vo>cvo)  •  (vg»-cvg)*  Hence  ^  ls  an  *d®l»8ible 
minima*  rule,  and  moreover  the  supremum  of  the  risk  of  <0^  is  equal  to  *}• 

However,  10^  is  not  the  unique  mlnimax  rule  (leaving  aside  the  trivial 
rule  10^  •  =  S)  •  To  see  this,  transform  (U^.U.,)  to  (V^.V,)  by  an  ortho¬ 

gonal  transformation  L  such  that  (EV^.EV^)  is  proportional  to  (1,  -d^) 

*  * 

and  (l.dj)  for  ^  “l  and  (v^.v,)  ^  Qj*  resPect*v*ly *  *nd  >  0, 

d,  >  0.  Let  be  the  rule  which  decides  (v^.v,)  €  iff  V^V,  <  0. 

It  can  be  easily  seen  (or,  see  [6])  that  the  supremum  of  the  risk  of  is  *j 

Note  that  there  are  many  such  orthogonal  transformations  L  which  will 

-  5  - 


satisfy  the  desired  property  for  (EV.  ,  EV0).  It  may  be  shown  that  neither 
of  the  rules  and  ip  dominates  the  other.  However,  the  characterization 

of  the  class  of  all  admissible  minimax  rules  is  not  known. 

Now,  instead  of  the  zero-one  loss  function  consider  a  loss  function 
which  takes  the  value  0  for  correct  decisions  and  equals 

for  any  incorrect  decision,  where  i  is  a  positive-valued  bounded,  continuous 

function  such  that  i(u)  -  0  as  A  4  0.  DasCupta  and  Bhattacharya  [3] 

have  shown  that  vp^  is  the  unique  minimax  rule  (  and  Bayes  admissible)  for 

the  above  loss  function  when  n^  •  n0. 

It  is  clear  that  neither  of  and  <P^  dominates  the  other.  It  is 

believed  that  tp®  is  also  admissible. 

P 

2.1.2  Invariant  Rules.  Let  us  now  consider  the  following  conditions  on 
the  rules  based  on 

Translation  Invariance: 

(2.11)  UHu^  u.,,  u^)  -  ^(Uj,  u.,,  u3  ♦  b) 
for  all  u  ,  u,,  u  and  b  £  R. 

1  4  j 

A  set  of  maximal  invariants  for  (2.11)  is  given  by  (U^ ,  U0).  Hence 
we  shall  write  a  translation-invariant  rule  as  a  function  of  U.  and  U  . 

X  4 

Sign  invariance: 

(2.12)  ^(Uj,  u^,  u^)  -  tf(-ult  -u,,  -Uj) 
for  all  Uj,  u.,,  Uj. 

A  translation-invariant  rule  is  sign-invariant  iff  it  is  a  function  of 
(Uj^/lujl  ,  |u2|).  [10J. 

Svj**  try : 

(2.13)  'Vui*"u2,u3)  *  'p2(ul,u2,u3) 
for  all  u^,  Uj  and  u^. 


-  6  - 


It  is  clear  that  both  ft^  and  ft.,  are  unchanged  under  the  transformations 
(v^,  u2,  u3)  -  (Uj,  u2,  u3  +  c)  and  (Uj,  u2>  Uj)  -  (-Uj,  -u2,  -Uj)  .  In 
terms  of  x,  x^  and  x.,  these  transformations  are  respectively 
(x,  x3 ,  x,)  *  (x+b,  Xj+b,  x2+b)  and  (x,  x^,  x2>  ■*  (-x,  -x^,  -x2>. 

The  sets  ft^  and  ft2  are  interchanged  under  the  transformation 
(u^t  u,,  Uj)  ♦  (u^,  -u,,  u3> .  This  transformation  is  obtained  by  inter¬ 
changing  (x^,  n^)  and  (x, ,  n.,).  We  shall  now  show  that  nP  is  the  uni¬ 
formly  best  translation-invariant,  sign-invariant  symmetric  rule.  For 

^V1,V2,V3^  ^  ^1  vi  “v»  vi  “  -cV)  th*  risk  a  translation-invariant, 

sign-invariant  symmetric  rule  ip  is  given  by 


00 


'VV 


V 


-  /  /  [vp;,(u1  ,u,)n(u1;v)n(u,  :-cv) 

oo 

♦  U>2(u1,u2)n(u1;-v)n(u2,cv) 

+  {1  -  ,u2>  intUj^  ;-'0>n(u2  ;-cv) 

(2.14)  «■  (1  -  02(u3 ,u2) Jnfu^ ;v)n(u, ;cv) ]  • 


du^du., , 


where  n(u;v)  is  the  density  of  N(v,l)  at  u.  It  may  be  seen  that  (2.14) 
is  minimum  (uniformly  in  v  and  v3)  for  <P-,(ultu,)  -  1  when  u^,  >  0. 

The  above  result  can  also  be  proved  using  the  distribution  of  (U^U,,/ |u, 1 , |U„ | ) 

(101. 


Kinderman  (10)  characterized  the  (essential)  complete  class  among  all 
translation-invariant,  sign-invariant  rules  when  n,  -  n,. 

2.1.3  Best  Invariant  Similar  Test.  The  classification  problem  may  be  viewed 
in  the  light  of  Neyman-Pearson  Theory.  We  may  pose  the  problem  as  testing 
the  hypothesis  H^:0  €  ^  against  the  alternative  0  €  ~  .  We  restrict 


our  attention  to  the  class  of  tests  which  are  translation-invariant  and 


sign-invariant.  Let  ^  be  a  test  function,  i.e.  iHX,  X^ ,  X,)  is  the  pro¬ 
bability  of  rejecting  given  X,  Xj  and  X,.  Define 

(2.15)  Yl  -  (M/Bjrtx-Xj), 

(2.16)  Y,  -  [(l+l/n2)~‘5(X-X2)  -  (l+l/n2)"1,(l-H/n1)'1(X-X1)]d. 

(2.17)  Y3  -  (l>r.1>n2)-‘5(X+n1X1-hi^X2), 

where  d  is  a  constant  chosen  appropriately  to  make  Var(Y-)  “1.  If  C*  is 

* 

translation-invariant  it  will  depend  only  on  Y,  and  Y?.  Furthermore  the 
sign-invariance  of  means 

(2.18)  v(ylty,)  -  ♦(-y1,-y,)* 

Under  H,  the  means  of  Y ^  and  Y,  are  given  by 

(2.19)  i  EYX  -  0,  t2  -  fY:  -  d  (W/rj'V^j). 

Similarly,  the  means  of  Yj  and  Y,  under  K„  are  given  by 

(2.20)  -  (1+1/b1)J*(u2-U1),  6  -  -d(2+l/n,)J*(2'#-l/n1)"1(M2-u1). 

In  terms  of  6^  and  d„,  the  parameter  sets  may  be  expressed  as 

(2.21)  -  {(*1.62) :  S1  "  °*  *  0}  * 

(2.22)  A2  -  {(d^dj):  -  ad ^  i  0} 

under  and  H, ,  respectively;  a  -  -dd+l/n^'^d+l/n^)-*5.  Since  62 

is  still  unknown  under  we  require  <1/  to  be  similar  sire  3  for  H^,  i.e 

(2.23)  E0  5  *(Y1(Y2)  -  a  for  all  &2  i  0. 

This  is  equivalent  to 

OD 

(2.24)  /  ♦(y1,y2)n(y1;0)dy1-  a  a.e.  (y2). 

The  power  of  the  test  w  is  given  by 

-  8  - 


W'W  ■ 

*  y  /  /  e-<5i(1+a  )/‘n(y1;0)n(y2;0)ii;(y1,y2) 


(2.25) 


f  6l(yl'H,y2)  "<Vyl+*y2) 

\  e  +  e 


dyjdy. 


Using  the  Neyman-Pearson  Lemma  In  order  Co  maximize 


(2.26) 


5,  (y,+ay,)  -<S,  (y^ay^ 


7  ,  J  vyrav  -v 

J  v(y. ,y,)| «  +  e 

-N  <• 


n(y1,0)dy1  . 


subjecc  co  (2.24)  we  gee  che  following  opcimum  cesc: 
(2.27)  <i«  (y^y,)  -  1  iff  lYj+ay,;  >  k(y.,) , 

where  k(y,)  Is  chosen  30  chac 


-ay,+k(y?) 

(2.28)  /  n(y  ;0)dy  -  1-3  . 

-ay,-k(y2) 

* 

Thus  v  is  Che  uniformly  aosc  powerful  invarianc  similar  cesc.  The  above 
resulc  is  due  Co  Schaafsma  (12). 


2 . 2  The  common  variance  0  is  unknown 

Ic  oav  be  easily  seen  chac  che  rules  given  by  (2.7)  and  (2.8)  are  scill 

unique  Saves.  Moreover,  Che  rule  Is  Che  one  which  accepCs  8  €  If 

(2.10)  holds  and  ic  Is  admissible  minimax.  When  n  ■  n.  Das  GupCa  and 

1  2 

Bhaccacharva  [ 3  J  have  shown  chac  che  rule  i*  is  Che  unique  (a.e.)  mini¬ 
max  when  che  loss  for  lncorrecc  decision  is  t( i U^-U, 1 /o) ,  where  l  is  a 

posiclve  valued,  bounded,  concinuous  funecion  such  chac  1(A)  ♦  0  as  A  *  0. 

To  see  all  che  above  resulCs,  noC  chac  (U^.U^.U^.S)  are  sufflcienc  sca- 
Cisdcs  in  chis  case  and  S  is  discribuced  independendy  of  (U^,U,,L'j). 

Ic  also  follows  chac  is  Che  uniformly  besc  cransladon-lnvarlanc ,  sym- 

mecric  rule.  To  see  chis,  condicion  on  S  and  fix  a, 

Schaafsma  [13]  has  shown  chac  che  following  crlCical  region  for  cescing 
H  againsc  H,  is  (i)  similar  of  size  3  for  H,  ,  (ii)  unbiased  for  H,, 

1  4  1  4 


and 
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UU)  asymptotically  (as  mininj.n,)  -»  «  )  most  stringent  among  all  level 
a  tests: 

(2.29)  Y  «ign(Y,>  >  JT  t 

1  2  n^+n.,-2 ,3 

where  Yj  and  Y;  are  given  In  (2.15)  and  (2.16),  £  Is  given  in  (1.5), 

and  t  is  the  upper  lOOaZ  point  of  the  Student's  t  distribution 

with  degrees  of  freedom.  However,  it  is  very  likely  that  this  test 

is  not  admissible. 

It  follows  from  Kiefer  and  Schwartz  [9]  that  the  rule  u  is  a  (unique) 

Bayes  rule.  We  6hall  give  a  sketch  of  the  prior  distribution  against  which 

X 

*L  is  unique  Bayes.  Consider  as  defined  in  (2.1)  -  (2.3).  Then 

•y 

l^'s  are  independently  distributed,  and  Uj  ~  N(v  c  ).  Moreover,  under 

6  €  . J  (i.e.  (Vj.Vj.V^)  €  ft^)  we  have  Vj  -  V,  v,  -  (-l)1cV,  V  i  0.  The 
prior  distribution  is  given  as  follows: 

(i)  P  (6  €  j)  -  i  i  -  1.2. 

(ii)  Given  6  €  ^  ,  the  conditional  distribution  of  (v.Vj.c4-) 

is  derived  from  the  following: 

(iia)  Giver,  c*"  •  (1+t‘)  ^ ,  the  conditional  distribution  of 

,  v  v3 

(— ,  — )  is  the  same  as  that  of  (TV,  TV,),  where  V 

cz  <r  3 

and  V j  are  independently  distributed  with 

V  ~  N(0.  (l+r2)/(l+c2))  and  \’3  -  S(0,  1+T2). 

(lib)  The  density  of  T  is  proportional  to  (1+T  )“ (»*1)/*. 
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3.  Multivariate  Case:  I  known 


Without  any  loss  of  generality  we  shall  assume  that  Z  -  1^.  First 

we  shall  derive  a  class  of  Bayes  rules  and  obtain  an  admissible  mini- 

max  rule.  Define  U^.U^.U^  and  k^.k.,  as  in  (2.))  -  (2.3),  except 

that  U  's  are  now  pxl  vectors  and  U.  .  N  (v,  I  ).  Correspondingly 
l  1  p  p 

redefine  the  sets  .2^  as  follows: 

(3.1)  -  {(v1,v2,v3) :  ux  -  v,v,  -  (-l)lcv,  i  0;  v,v3  €  Rp}, 


i  •  1,2.  As  before  Uj  may  be  eliminated  from  a  Bayes  rule  by  taking 
a  fixed  distribution.  Independent  of  (v^.v,),  under  both  and  (2,. 

Now  consider  the  prior  distribution  which  assigns  the  probability  to 
£2^  and,  given  v  «  v.vi  "  (-D^cv,  the  distribution  of  v 
is  N^(0, T‘lp) .  It  can  now  be  seen  that  the  unique  (a.e.)  Bayes  rule 
against  the  above  prior  distribution  decides  (v^.v^.Vj)  €  iff 


(3.2)  U2  <_  k. 

where  k  is  a  function  of  and  ;  conversely,  given  k  the 

probability  and  can  be  suitably  ci.osen .  Thus  any  likelihood- 

it 

ratio  rule  is  Bayes  and  admissible. 

We  shall  now  show  that  ^  Is  alnlmax.  First  we  shall  consider 
a  different  prior  distribution  against  which  ^  is  unique  Bayes. 

As  before,  \i ^  can  be  eliminated  from  the  problem.  Now  consider  a 

prior  distribution  which  assigns  equal  probabilities  to  the  sets 

*  a 

•2^  and  where 

(3.3)  £2  *{(vj,v2):  “  v.u,  -  (-l)lcv,  v  i  0,  v  €  R?). 


-11 


Moreover,  given  that  (Vj.v^  €  'V  the  distribution  of  v  is  taken  to 

2 

be  uniform  over  the  surface  of  the  hvpershpere  v"  v  -  A  .  See  Das  Gupta 
[4]  to  get  a  detailed  proof  of  the  fact  that  is  unique  (a.e.)  Bayes 

against  the  above  prior  distribution.  To  see  that  is  minimax,  note 

that  the  risk  of  is  constant  over  the  set 

2 

(3.4)  1 . v2 *v3> :  v  -  v,vn  »-cv,  v'  v  •  A  }. 

U  ((v1.v2,v3):  vx  -  v,v2  -  cv.  v'v-  t?  • . 

Das  Gupta  [4]  has  also  shown  that  the  rule  is  the  unique  (a.e.) 

minimax  when  the  loss  for  any  correct  decision  is  zero,  and  the  loss  for 
deciding  u  •  incorrectly  is 

(3.5)  l[(l  ♦  l/ni)~1  (u  -  Vt)'  (U  -  D1)J, 

where  £  is  a  positive-valued,  bounded,  continuous  function  such  that 

£(A)  ♦  0  «  i  i  0. 

As  in  (2.11)  we  may  call  a  rule  0  translation-invariant  if 

(3.6)  ♦(01.U2.U,)  -  ♦(C1,W2,03  ♦  b), 

for  all  b  €  Rr.  Clearly,  (l’^,l\)  is  a  set  of  maximal  Invariants.  A 
rule  t  is  called  orthogonally-invariant  if 

(3.s)  ♦(o1,o2,o3)  -  ♦(oo1,ou2,ou3), 

for  all  orthogonal  p  *  p  matrices  0. 

kudo  (11]  considered  the  following  "symmetry''  condition  for  a  trans- 
lat  ion-invariant  rule  $: 

(3.9)  B1(#;(l  ♦  l/n2)"1/2d)  -  62(<t; (1  ♦  l/n1)“1/2d), 

where  £,(#;d)  ■  E.C,  when  d  »  (u,  -  u,)  and  p  ■  U . .  Moreover,  he 
1  cl  11  1 
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required  to  depend  on  d  only  through  d 'd .  This  condition 

clearly  holds  if  $  is  translation-invariant  and  orthogonally-lnvarlant . 

Note  also  that  for  a  translation-invariant  and  an  orthogonally-lnvarlant 

rule  9  satisfying  (2.13)  the  condition  (3.9)  holds.  Kudo  [11]  has 

9hown  that  0^  siaultaneously  maximizes  both  £^($;d)  and  S,($;d)  in 

the  class  of  all  translation-invariant  rules  satisfying  (3.9)  and  for 

which  £^($;d)  depends  on  d  only  through  d'd.  This  can  be  seen  easily 

by  integrating  the  probability  of  correct  classification  with  respect  to 

2  1 

the  uniform  distribution  of  v  over  v'  v  ■  A",  where  -  V,V2  "  (-1)  CV. 

Rao  [151  has  considered  the  class  £*  of  rules  whose  probabilities 
of  alsclasslf lcatlon  depend  only  on 

(3.10)  A2  -  (UL  -  U2)'  I“l  (Ux  -P,). 

—  2  2 

For  a  rule  i  €  £*  let  G^(0;A  )  and  G,($;A  )  be  the  error  probabilities 
when  u  ■  and  u  "  respectively.  Rao  [151  has  posed  the  problem 

of  minimizing 

(3.11)  -aC  (*;A2)  ♦  bG,($:A2)}| 

da  1  !a  -  0 

subject  to  the  condition  that  the  ratio  of  C^($;0)  to  G,($;0)  is  equal 

to  some  specified  constant.  the  resulting  optimum  rule  decides  u  ■  iff 

(3.12)  a( (X-X  ,)  -  (1  ♦  1/n, ) (X-X,) J '  [(X-X  .)  -  (1  ♦  l/n. )(X-X,) 1 

1  1  L  1  12 

-  b[(l  +  l/n2)(X-3?1)  -  (X-X,) )  '  [  (1  ♦  l/n,)  (X  3^)  -  (X-X,)  1  >  k 

The  above  rule  coincides  with  when  n^  ■  n,  and  a  •  b .  k  *  0 . 
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4.  Multivariate  Case: 


r 


unknown 


First  we  shall  show  that  a  likelihood-ratio  rule  is  unique 

(a.e.)  Bayes  and  hence  is  admissible  (for  zero-one  loss  function).  Note 
that  and  S  :  re  sufficient  statistics  in  this  case,  where  l^’s 

(in  p  *  1  vector  notations)  are  given  by  (2.1)  -  (2.3)  and  S  is  given  by 
(1.5).  Here  ^  •  w*  now  consider  the  following  prior  distri¬ 

bution. 

(i)  p  (6  c  e1>  -  ni 

(ii)  Civen  6  £  £  (i.e.,  v  •  v.v.,  ■  (-D^cv)^  the  conditional 

distribution  of  (v.v^,  Z)  is  derived  from  the  following: 

(lia)  Given  I  *  -  1  +  TT'(T:  p  *  1),  the  conditional  distri- 

P 

bution  of  (I  *  v,  Z  *  Vj)  is  the  same  as  the  distribution  of  (TV,  TV^) , 
where  V  and  are  independently  distributed  as 

N(0,  (1+c2)-1  (1  ♦  T  *T))  and  N(0,  1  +  t't), 
respectively. 

(lib)  The  density  of  T  is  proportional  to  (1  +  t't) 
where  m  >  p  -  l. 

Following  a  simplified  version  of  the  results  of  Kiefer  and  Schwartz 
[9)  it  can  be  shown  that  a  unique  (a.e.)  Bayes  rule  against  the 
above  prior  distribution  accepts  V  ■  U.  if  (1.7)  holds,  where  X 
is  a  function  of  C^’s;  conversely,  given  X  the  constants  C^'s  can 
by  appropriately  chosen. 

Das  Gupta  (4)  has  considered  a  class  of  rules  invariant  under  the 

following  transformations: 

(4.1  (X.Xj^.S)  -  (AX  ♦  b.AXj  ♦  b,  AX2  +  b,  ASA'), 
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where  A  la  any  p  *  p  nonsingular  matrix  and  b  is  any  vector  in  RP. 

It  is  shown  [  4  ]  that  a  set  of  maximal  invariants  is  given  by  («^ .m^ •“•><>)  • 
where 


(4.2)  mtJ  -  S_1  Uj/a. 

i  _i  *> 

When  -  v.v,,  -  (-1)  cv,  v  E  v  «  A  ,  the  joint  density  of  (m^ ,m, , .®22) 
is  given  by  [14] 

pi(®irm12,a::;  iil)  "  K  exPr-A2(l  c2)/2]  |M|(p"3)/2 

CO 

^4.3)  Zq  HJ(a11.m12.m22). 


where 


(4.4)  h»,  * 


(a,,  +  2(-l)icm11  +  c“n.,_  +  (1  ♦  O  M|)^ 


"11 


“12 


— (»+2)  +  j 
I,  *  Ml2 


(4.5) 

(4.6) 


| M|  -  dec  M.  M 
m  •  n^  +  n,  -  2 , 


and  K  >  0,  gj  >  0  are  numerical  constants. 

Consider  a  prior  distribution  which  assign  equal  probabilities  to 
9  and,  given  6  €  0  (l.e.  \>  *  v,v,  ”  (-l)*cv)  the  value  of 

1  L  la 

a  -1  2  — 

v'  E  v  -  A  is  held  fixed.  The  Bayes  rule  in  against  the  above 

prior  distribution  decides  0  €  0^  iff 

(4.7)  “12  <  ° 

To  see  this ,  note  chat  for  a  >  0 

(4.8)  (a  ♦  x)J  <  (a  -  x)J 
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for  any  positive  J  if  x  <  0.  The  relation  (4.7)  is  the  same  as  (1.7) 
for  X  ■  1.  It  now  follows  easily  that  the  rule  ^  is  admissible  and 
minimax  in  £**  (4).  Das  Cupta  (4)  has  also  shown  that  is  the 

unique  (a.e.)  minimax  in  if  the  loss  for  any  correct  decision  is 

zero  and  the  loss  for  deciding  u  ■  incorrectly  is 

(4.9)  11(1  +  1/n^"1  (u  -  i^)'  I"1  (u  -  Ut)). 

where  i  is  a  positive-valued,  bounded,  continuous  function  such  that 

1(A)  -»  0  as  A  *  0. 

Again  for  this  case  Rao  [15]  considered  the  class  of  rules  whose 

2 

probabilities  of  misc  lassi  f  icat  ion  depend  only  on  A  given  in  (3.10). 

Then  he  derived  the  optimum  rule  which  minimires  the  expression  given  by 
(3.11)  subject  to  the  condition  of  similarity  for  the  subset  of  the  para¬ 
meters  given  by  u.  ■  U, .  The  optimum  rule  decides  u  ■  e,  iff 

(4.10)  a((X-X1)  -(!•*•  1/t»1)  (X-X2) )  '  B^KX-Sj)  -  (1  +  l/n^U-X,)] 

-  b[(X-X:)  -  (1  +  l/n2)(X-S1)]'  B-1 [ (X-X.,)  -  (1  +  l/n2)(X-X1)J 
>  c (B) , 


where 


(4.11) 


B  •  mS  + 


V: 

l  +  Oj  +  n. 


[(1  +  l/n2)(X-X1)  '(X-Xx) 


t  (1  t  l/ni)(X-X2)'(X-X2)  -  2(X-X1)'(X-X2)). 


It  is  not  clear  why  Rao  Imposed  the  similarity  condition  even  after 

restricting  to  the  class  .  One  may  directly  consider  the  class  of  rules 

invariant  under  (4.1)  and  try  to  minimize  (3.11)  subject  to  the  condition 

that  G^fCjO)  is  equal  to  a  specified  constant.  Using  (4.3)  it  can  be 

found  that  the  optimum  rule  decides  u  u.  iff 

“  0 
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(4.12)  a(k^  mu  +  k.^  «22  +  (k^  +  k*)  |m|  -  2k1k2m12)(l  +  l/n^"1 

-  b(k‘  +  k‘  a22  +  (kj  +  k‘)  |m|  +  2k1k2a12)(l  +  1/n^ 

>  A  det  (I  +  M). 

As  In  (2.29)  a  similar  region  for  may  be  constructed  for  this  case 

also.  It  is  given  by  the  following: 

1 

(4.13)  Y ,*  (  h6  +  TlT1'  )“l  Yj^/IYj  (aS  +  Y^"1  \ ^  >  k, 

where  Y^  and  Y2  are  given  in  (2.15)  and  (2.16)  in  vector  notations 
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5.  Multivariate  Case:  and  u2  known 


In  this  case  the  plug-in  rules  are  given  by  the  following:  Decide 
li  -  Mj  if 

(4.14)  (X  -  '  A-1(X  -  l^)  -  (X  -  u2)  '  A_1  (X  -  u2)  >  A, 

where 

(4.15)  A  -  [mS  +  n^Xj  -  UjXXj  -  Pj)  '  «■  n0(X,  -  u.,)(X.,  -  u2>]. 

On  the  other  hand,  a  likelihood-ratio  rule  decides  p  *  iff 

1  ♦  (X  -  U.)  '  A-1 (X  -  U,) 

(4.16)  - = - r - —  >  A  (0  <  A). 

1  ♦  (X  -  Uj)  '  A_1(X  -  Ux) 

Define  m*  -  m  +  2 . 


Without  loss  of  generality  we  may  assume  that  •  0  and 
U,  -  (1,0,... 0).  Then  the  problem  is  invariant  under  the  following  trans¬ 
formations: 

(4.17)  (X.A)  -  (LX,  LAL'), 

where  L  is  a  nonsingular  p  k  p  matrix  of  the  form 

(4.18) 

L  - 


L12 


‘22 
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It  can  be  seen  that  a  sec  of  maximal  Invariants  is  given  by 
<xli2>  X(2)A22X(2)  *  All  2^’  w^ere 


(4.19) 


(4.20) 


T.1  I  A12 


A21  A22 


1 

P-1 

■  A  . 

-  A 

11.2 

11 

■ 

X  - 

A 

1.2 

1 

12 

2  is  distributed, Independently  of  (X^  2 ,X(2)A22X12^ ’  3S 

v2  -  -1 

311  '  m^-p+i’  *lven  X(  ,)A2,Xp) ,  the  distribution  of  X^  ,  is 

N(d,  +  •  and  xr)A,lxr)  is  distributed  as  the 

2  2 

ratio  of  independent  ^  ^  and  ^ ,  variates.  In  the  above  d 


is  equal  to  0  or  1  according  as  U  •  or  U  -  and  2  !• 

the  residual  variance  of  X^  given  X^  .  Ic  can  be  shown  now  that  the 
following  rule  is  ainlmax  (and  Bayes)  in  Che  class  of  rules  invariant 
under  (4.18):  Decide  U  •  iff 


(4.22)  2  <  1/2. 


The  relation  (4.22)  is  the  same  as  (4.14)  for  1*0,  and  as  (4.16)  for 
1*1.  The  above  region  is  not  similar  for  U  ■  U..  Such  a  simular 

region  may  be  constructed  using 

_  1  _  1 
(4.23)  Xj  2U  ♦  X(2)A22X(2))2  (AH 


which  is  distributed  as  Student's'  t  -  distribution  with  m*-p+l  degrees 
of  freedom  when  u  "  U.  .  The  Mahalanobls  distance  is  equal  to  (o  ,)^ 

i  11  •  4 
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in  this  case.  The  probabilities  of  correct  classification  for  the  rule 
given  by  (4.22)  are  the  same  and  they  decrease  as  p  increases  if  0^ 
is  held  fixed. 

This  section  is  new  in  the  literature  and  it  is  due  to  the  present 
author . 
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